-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce outlines.models.mlxlm
#956
Conversation
outlines/models/mlxlm.py
Outdated
from outlines.generate.api import GenerationParameters, SamplingParameters | ||
from outlines.processors import BaseLogitsProcessor | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean the user must have mlx
installed, whether they want to use this integration or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will attempt to import, but the module will load fine if mlx
isn't installed because the exception passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s cleaner to import the libraries directly in the methods/functions where they’re used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Looks good, just one small comment on imports. Should be good to merge once the change has been made. |
```python | ||
from outlines import models | ||
|
||
model = models.mlxlm("mlx-community/mlx-community/Meta-Llama-3-8B-Instruct-8bit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlx-community is repeated twice
@lapp0 Do we have any means to see verbose information for MLX? |
Fixes #918
Introduce new model:
outlines.models.mlxlm
Details
outlines.models.mlxlm
outlines.processors
logits processors forgenerate.regex
andgenerate.text
(only used formlxlm
for now, but will use the same logits processors fortransformers
in Update thetransformers
integration #806)Tests:
model_mlxlm
tests are skipped if not on Apple Silicontests/generate/test_generate.py
which tests mlxlm generation (parametrized along-side transformers and llama-cpp)Performance
Using
mlx-community/Qwen1.5-1.8B-Chat-4bit
on a Mac Mini M2, all sampling is greedy:outlines.generate.text
: 44.0 tokens / secondoutlines.generate.regex(model, "a{200}")
: 51.68 tokens / secondoutlines.generate.regex(model, ".{200}")
: 27.5 tokens / secondThe core performance issue with
outlines.generate.regex(model, ".{200}")
is the need to convert a large (~150,000 integer) list into a tensor in the logits processorTo mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in
self.fsm.get_next_instruction(self._fsm_state).tokens
being a tensor of token IDs.Misc
Smoke test
Testing Without Apple
I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:
How to test outlines mlx
install homebrew
ensure we're using openssl in python
install outlines and mlx_lm